Parallelizing XML data-streaming workflows via MapReduce
نویسندگان
چکیده
منابع مشابه
Parallelizing XML data-streaming workflows via MapReduce
In prior work it has been shown that the design of scientific workflows can benefit from a collection-oriented modeling paradigm which views scientific workflows as pipelines of XML stream processors. In this paper, we present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the Map-Reduce framework. Pipelines in our approach consist...
متن کاملParallelizing XML Processing Pipelines via MapReduce
We present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the MapReduce framework. Pipelines in our approach consist of sequences of processing steps that consume XML-structured data and produce, often through calls to “black-box” functions, modified (i.e., updated) XML structures. Our main contributions are a set of strategies for...
متن کاملParallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce
Processing XML queries over big XML data using MapReduce has been studied in recent years. However, the existing works focus on partitioning XML documents and distributing XML fragments into different compute nodes. This attempt may introduce high overhead in XML fragment transferring from one node to another during MapReduce execution. Motivated by the structural join based XML query processin...
متن کاملParallelizing bioinformatics applications with MapReduce
Current bioinformatics applications require both management of huge amounts of data and heavy computation: fulfilling these requirements calls for simple ways to implement parallel computing. MapReduce is a general-purpose parallelization technology that appears to be particularly well adapted to this task. Here we report on its application, using its open source implementation Hadoop, to two r...
متن کاملOn Parallelizing Streaming Algorithms
We study the complexity of parallelizing streaming algorithms (or equivalently, branching programs). If M(f) denotes the minimum average memory required to compute a function f(x1, x2, . . . , xn) how much memory is required to compute f on k independent streams that arrive in parallel? We show that when the inputs (updates) are sampled independently from some domain X and M(f) = Ω(n), then com...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computer and System Sciences
سال: 2010
ISSN: 0022-0000
DOI: 10.1016/j.jcss.2009.11.006